Search CORE

67 research outputs found

Bringing industry standards to Open Source localisers : a case study of Virtaal

Author: Morado Vázquez Lucía
Wolff Friedel
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2011
Field of study

The XML Localisation Interchange File Format (XLIFF) is an open standard promising interoperability and tool independence. It might be thought of as a natural fit for Open Source localisation, yet the Gettext PO format remains the de facto standard in Open Source localisation. We present a case study of the XLIFF implementation in Virtaal - an Open Source localisation tool supporting multiple formats. The primary target user group of Virtaal is made up of localisers of Open Source software - often volunteers. We study the implementation choices adopted by the developers with specific focus on the workflow metadata in XLIFF. In this regard we propose recommendations for simplification that hopefully improve XLIFF for use by a wider audience in future.El format XLIFF (XML Localisation Interchange Format) és un estàndard obert que vol facilitar la interoperabilitat en localització així com la independència d'eines específiques. Es podria considerar un format ideal per a la localització de programari de codi obert, però el format PO de Gettext continua sent l'estàndard de facto en aquest tipus d'entorns. En aquest article presentem un estudi de cas sobre la implementació de XLIFF en Virtaal, una eina per a la localització de programari de codi obert compatible amb diversos formats. Virtaal es dirigeix principalment a localitzadors de programari obert, que sovint són voluntaris. Hem estudiat les solucions que els desenvolupadors van adoptar durant la implementació, especialment en relació a les metadades relatives al flux de treball en XLIFF. En aquest sentit, proposem algunes recomanacions per simplificar aquest estàndard que esperem que puguin contribuir a millorar XLIFF i que pugui ser utilitzat per un major grup d'usuaris en el futur.El formato XLIFF (XML Localisation Interchange Format) es un estándar abierto pensado para facilitar la interoperabilidad y la independencia respecto a herramientas. A pesar de que XLIFF puede parecer una solución ideal para la localización de software de código abierto, el formato PO de gettext continúa siendo el estándar de facto en este tipo de entornos. En este artículo presentamos un estudio de caso sobre la implementación de XLIFF en Virtaal, una herramienta para la localización de software de código abierto compatible con varios formatos. Virtaal se dirige principalmente a los localizadores de software abierto, muchos de ellos voluntarios. Hemos estudiado las soluciones adoptadas por los desarrolladores durante su implementación, en especial las relacionadas con los metadatos de flujo de trabajo en XLIFF. En este sentido, hacemos algunas propuestas para simplificar este estándar que esperamos puedan contribuir a mejorar XLIFF y a ampliar su círculo de usuarios en el futuro

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Diposit Digital de Documents de la UAB

Skoner en kleiner vertaalgeheues

Author: Wolff Friedel
Publication venue
Publication date: 01/10/2018
Field of study

Rekenaars kan ’n nuttige rol speel in vertaling. Twee benaderings is vertaalgeheuestelsels en masjienvertaalstelsels. By hierdie twee tegnologieë word ’n vertaalgeheue gebruik—’n tweetalige versameling vorige vertalings. Hierdie proefskrif bied metodes aan om die kwaliteit van ’n vertaalgeheue te verbeter. ’n Masjienleerbenadering word gevolg om foutiewe inskrywings in ’n vertaalgeheue te identifiseer. ’n Verskeidenheid leerkenmerke in drie kategorieë word aangebied: kenmerke wat verband hou met tekslengte, kenmerke wat deur kwaliteittoetsers soos vertaaltoetsers, ’n speltoetser en ’n grammatikatoetser bereken word, asook statistiese kenmerke wat met behulp van eksterne data bereken word. Die evaluasie van vertaalgeheuestelsels is nog nie gestandaardiseer nie. In hierdie proefskrif word ’n verskeidenheid probleme met bestaande evaluasiemetodes uitgewys, en ’n verbeterde evaluasiemetode word ontwikkel. Deur die foutiewe inskrywings uit ’n vertaalgeheue te verwyder, is ’n kleiner, skoner vertaalgeheue beskikbaar vir toepassings. Eksperimente dui aan dat so ’n vertaalgeheue beter prestasie behaal in ’n vertaalgeheuestelsel. As ondersteunende bewys vir die waarde van ’n skoner vertaalgeheue word ’n verbetering ook aangedui by die opleiding van ’n masjienvertaalstelsel.Computers can play a useful role in translation. Two approaches are translation memory systems and machine translation systems. With these two technologies a translation memory is used— a bilingual collection of previous translations. This thesis presents methods to improve the quality of a translation memory. A machine learning approach is followed to identify incorrect entries in a translation memory. A variety of learning features in three categories are presented: features associated with text length, features calculated by quality checkers such as translation checkers, a spell checker and a grammar checker, as well as statistical features computed with the help of external data. The evaluation of translation memory systems is not yet standardised. This thesis points out a number of problems with existing evaluation methods, and an improved evaluation method is developed. By removing the incorrect entries in a translation memory, a smaller, cleaner translation memory is available to applications. Experiments demonstrate that such a translation memory results in better performance in a translation memory system. As supporting evidence for the value of a cleaner translation memory, an improvement is also achieved in training a machine translation system.School of ComputingPh. D. (Rekenaarwetenskap

Unisa Institutional Repository

Syllabification and parameter optimisation in Zulu to English machine translation

Author: Kotzé Gideon
Wolff Friedel
Publication venue: South African Institute of Computer Scientists and Information Technologists (SAICSIT)
Publication date: 10/12/2015
Field of study

We present a series of experiments involving the machine translation of Zulu to English using a well-known statistical software system. Due to morphological complexity and relative scarcity of resources, the case of Zulu is challenging. Against a selection of baseline models, we show that a relatively naive approach of dividing Zulu words into syllables leads to a surprising improvement. We further improve on this model through manual configuration changes. Our best model significantly outperforms the baseline models (BLEU measure, at p < 0.001) even when they are optimised to a similar degree, only falling short of the well-known Morfessor morphological analyser that makes use of relatively sophisticated algorithms. These experiments suggest that even a simple optimisation procedure can improve the quality of this approach to a significant degree. This is promising particularly because it improves on a mostly language independent approach — at least within the same language family. Our work also drives the point home that sub-lexical alignment for Zulu is crucial for improved translation quality.Academy of African Languages and Science (AALS

Directory of Open Access Journals

Unisa Institutional Repository